Introduction to nonlinear optimization using factor graphs
Non-linear optimization of sparse problems is well explained by describing the problem as a factor graph. A factor graph looks like this,
Toy example of a factor graph. The round nodes are state blocks, labelled 0 to 7. The square nodes are factors encoding measured relations between the state blocks, labelled 1 to 10.
Sparsity: state blocks and factors
In a factor graph, the system’s state vector is partitioned into blocks of reasonably small size, called state blocks.
Each measurement made on the system depends on a small handful of state blocks. These are constituted as factors.
For each factor, a function of this subgroup of state blocks and the measurement is used to compute a residual (otherwise known as ‘expectation error’, and equivalent to what is known in the filtering literature as ‘innovation’).
The optimizer or Solver then tries to find the overall state that minimizes the overall residual, in a least squares sense.
The result is the optimal state, which constitutes the best available solution to the problem.
Factor residuals
To make sparsity visible and exploitable, let us partition the state vector into state blocks,
Let us define the set of factors
relating subsets of these blocks, \(X_k= { x_{k_1},..., x_{k_n}}\) to measurements \(y_k\)
Each factor \(C_k\) computes an estimation error \(e_k\),
Note
For example, imagine the 4th measurement \(y_4\) is a function of:
the robot position \(x_1\),
the robot orientation \(x_2\),
the sensor position \(x_5\) and orientation \(x_6\) in the robot, and
the position of a landmark in the environment, \(x_{17}\).
Then the error function above reads,
Let us also define the amount of information provided by the measurement \(y_k\) of this factor through an information matrix, \(W_k\). This is the inverse of the covariances matrix of the measurement.
Optimal state
Then, the optimal state \(x^*\) is the solution to the least-squares minimization problem,
In WOLF, we define the Residual computation as being the estimation error weighted by its information matrix. This we do by realizing that, if we factorize \(W=W^{1/2}W^{\top/2}\), then
obtaining a definition of the residual that is well a weighted estimation error,
Iterative optimization via linearizations
In general, non-linear optimizers are here referred to as solvers, because they are the ones responsible for finding the state that minimizes the residual, i.e., the optimal state.
Solvers for non-linear problems proceed by
Linearize all the factors.
For each factor \(k\):
Evaluate the residual (we consider it depends only on two state blocks indexed by i and j)
\[r_k = W^{\top/2} e_k (x_i, x_j, y_k)\]Evaluate the Jacobians wrt. each concerned state block,
\[J^k_i = \frac{\partial r_k}{\partial x_i} \quad , \quad J^k_j = \frac{\partial r_k}{\partial x_j}\]Append Jacobians and residual to a sparse full-size Jacobian matrix \(A\) and residual vector \(b\), matching rows and columns with what corresponds to state indices \(i,j\) and factor indices \(k\).
\[A_ki = J^k_i \quad , \quad A_kj = J^k_j \quad , \quad b_k = r_k\]
Compute an optimal step correction \(dx\) of the linearized system
Update the state \(x\) with this correction step
\[x \gets x \oplus dx\]where \(\oplus\) is the composition law of each state blocks’s manifold. In effect, the update is made state block by state block:
For plain vectors, it is a regular sum
\[v \oplus \delta v = v + \delta v\]For rotations, we use the exponential map
\[R \oplus \delta \theta = R \cdot \exp (\delta \theta)\]
Iterate from 1. until convergence
Note
In practice, operations 2, 3 and 4 are relegated to the Solver.
Regarding 1, at each iteration the solver calls WOLF each time it needs to linearize a new factor.
The factor’s math (residual and Jacobians) is therefore implemented in the WOLF classes deriving from FactorBase.
These maths are in the majority of cases very simple.
The power of graph-based optimization (and WOLF for that sake) is that, by properly organizing a large and heterogeneous number of these simple factors, very large and complex problems can be solved efficiently.